Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures
نویسندگان
چکیده
title Main Track 1 Comparison of Large Graphs Using Distance Information W. Czech, W. Mielczarek and W. Dzwinel 2 Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures M. Baboulin, J. Dongarra, A. Remy, S. Tomov and I. Yamazaki 3 Fuzzy Transducers as a Tool for Translating Noisy Data in Electrical Load Forecast System M. Flasiński, J. Jurek and T. Peszek 4 Distributed Computing Instrastructure as a Tool for e-Science J. Kitowski, K. Wiatr, Ł. Dutka, M. Twardy, T. Szepieniec, M. Sterzel, R. Słota and R. Pająk 5 LU Preconditioning for Overdetermined Sparse Least Squares Problems G. Howell and M. Baboulin 6 FEniCS-HPC: Automated predictive high-performance finite element computing J. Jansson, J. Hoffman and N. Jansson 7 Parallel implementation of the FETI DDM constraint matrix on top of PETSc for the PermonFLLOP package A. Vasatova, M. Cermak and V. Hapla 8 A lightweight approach for deployment of scientific workflows in cloud infrastructures B. Balis, M. Bubak, K. Figiela, M. Malawski and M. Pawlik 9 Fast Incremental Community Detection on Dynamic Graphs A. Zakrzewska and D. Bader 10 A Parallel Multi-Threaded Solver for Symmetric Positive Definite Bordered Band Linear Systems P. Benner, P. Ezzatti, E. S. Quintana-Orti and A. Remon-Gomez 11 Optimized Parallel Model Of Human Detection Based On The Multi-Scale Covariance Descriptor N. Abid, T. Ouni, K. Loukil, A. C. Ammari and M. Abid 12 Performance analysis of the Kahan-enhanced scalar product on current multicore processors J. Hofmann, D. Fey, J. Eitzinger, G. Hager, G. Wellein and M. Riedmann 13 Parallel Extremal Optimization with Guided Search and Crossover Applied to Load Balancing M. Tudruj and E. Laskowski 14 A Parallel Algorithm for LZW decompression, with GPU implementation S. Funasaka, K. Nakano and Y. Ito 15 Performance Analysis of the Chebyshev Basis Conjugate Gradient Method on the K Computer Y. Kumagai, A. Fujii, T. Tanaka, Y. Hirota, T. Fukaya, T. Imamura and R. Suda 16 Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Leading Multicore Architectures A. Druinsky, P. Ghysels, S. Li, O. Marques, S. Williams, A. Barker, D. Kalchev and P. Vassilevski 17 GPU Accelerated Simulations of Magnetic Resonance Imaging of Vascular Structures K. Jurczuk, D. Murawski, M. Kretowski and J. Bezy-Wendling 18 A Parallel FDFM Approach for Breaking Weak RSA Keys using the FPGA X. Zhou, K. Nakano and Y. Ito 19 Massively Parallel Approach to Sensitivity Analysis on HPC Architectures by using Scalarm Platform D. Bachniak, J. Liput, L. Rauch, R. Słota and J. Kitowski 20 Experimental Optimization of Parallel 3D Overlapping Domain Decomposition Schemes S. Guzzetti, A. Veneziani and V. Sunderam 21 Accelerating NWChem Coupled Cluster through dataflow-based Execution H. McCraw, A. Danalis, G. Bosilca and J. Dongarra 22 A Diffusion Process for Graph Partitioning: its Solutions and their Refinement A. Jocksch 23 Metadata Organization and Management for Globalization of Data Access with onedata M. Wrzeszcz, T. Lichoń, R. Słota, K. Zemek, K. Trzepla, Ł. Opioła, D. Nikolow, L. Dutka, R. Slota and J. Kitowski 24 Exploring Memory Error Vulnerability for Parallel Programming Models I. Öz, M. Gil, G. Utrera and X. Martorell 25 Experience on vectorizing Lattice Boltzmann kernels for multiand many-core architectures E. Calore, N. Demo, S. F. Schifano and R. Tripiccione 26 An Approach for Ensuring Reliable Functioning of a Supercomputer Based on a Formal Model A. Antonov, D. Nikitenko, P. Shvets, S. Sobolev, K. Stefanov, V. Voevodin, V. Voevodin and S. Zhumatiy 27 Parallel Induction of Nondeterministic Finite Automata T. Jastrząb, Z. J. Czech and W. Wieczorek 28 Synthetic Signature Program for Performance Scalability J. Panadero, A. Wong, D. Rexachs and E. Luque 29 Parallel differential evolution in the PGAS programming model implemented with PCJ Java library Ł. Górski, F. Rakowski and P. Bala 30 Parallelization and Optimization of a CAD Model Processing Tool from the Automotive Industry to Distributed Memory Parallel Computers L. F. Ayuso, J. J. Durillo, T. Fahringer, B. Kornberger and M. Schifko 31 A bucket sort algorithm for the particle-in-cell method on manycore architectures A. Jocksch, F. Hariri, T. M. Tran, S. Brunner, C. Gheller and L. Villard 32 Energy efficient calculations of text similarity measure on FPGA-accelerated computing platforms M. Karwatowski, P. Russek, M. Wielgosz, S. Koryciak and K. Wiatr 33 Implementing deep learning algorithms on graphics processor units K. Grzegorczyk, M. Kurdziel and P. Wójcik 34 Sparse matrix multiplication on dataflow engines V. Simic, N. Savic, V. Ciric and I. Milentijevic 35 Scalable Distributed Two-Layer Block Based Datastore A. Krechowicz, S. Deniziak, M. Bedla, A. Chrobot and G. Łukawski 36 Adaptation of Deep Belief Network to modern multicore architectures T. Olas, W. K. Mleczko and R. K. Nowicki 37 Parallel Algorithms for Wireless LAN Planning A. Gnatowski and J. Kwiatkowski 38 Accelerating Sparse Arithmetic in the Context of Newton's Method for Small Molecules with Bond Constraints C. C. K. Mikkelsen, J. Alastruey-Benede, P. Ibanez-Marin and P. G. Risueno 39 Toward parallel implementation of numerical model of solidification based on the generalized fi ite difference method using Intel Xeon Phi A. Kulawik, L. Szustak, K. Halbiniak, J. Wrobel and P. Gepner
منابع مشابه
Non-GPU-resident symmetric indefinite factorization
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA Correspondence Ichitaro Yamazaki, Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA. Email: [email protected] Funding information National Science Foundation NVIDIAMatrix Algebra for GPU andMulticore Architectures (MAGMA) for Large Petascale Sys...
متن کاملSolving dense symmetric indefinite systems using GPUs
This paper studies the performance of different algorithms for solving a dense symmetric indefinite linear system of equations on multicore CPUs with a Graphics Processing Unit (GPU). To ensure the numerical stability of the factorization, pivoting is required. Obtaining high performance of such algorithms on the GPU is difficult because all the existing pivoting strategies lead to frequent syn...
متن کاملAutomatically Tuned Dense Linear Algebra for Multicore+GPU
The Multicore+GPU architecture has been adopted in some of the fastest supercomputers listed on the TOP500. The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures processors like Multicore+GPU. However, to provide portable performance, manual parameter tuning is required. This paper presents automatically tuned LU factorizat...
متن کاملOne-sided dense matrix factorizations on a multicore with multiple GPU accelerators in MAGMA1
One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize ...
متن کاملOne-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators
One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015